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Abstract 

We develop fast algorithms for solving regression problems on graphs where one is given the value of a function 
at some vertices, and must find its smoothest possible extension to all vertices. The extension we compute is the 
absolutely minimal Lipschitz extension, and is the limit for large p of p-Laplacian regularization. We present an 
algorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computes 
an absolutely minimal Lipschitz extension in expected time 0(mn). The latter algorithm has variants that seem 
to run much faster in practice. These extensions are particularly amenable to regularization: we can perform Iq- 
regularization on the given values in polynomial time and Zi-regularization on the initial function values and on graph 
edge weights in time 0(m 3 ^ 2 ). 

Our definitions and algorithms naturally extend to directed graphs. 


1 Introduction 

We consider a problem in which we are given a weighted undirected graph G = (V, E, £) and values vq '■ T —> R 
on a subset T of its vertices. We view the weights £ as indicating the lengths of edges, with shorter length indicating 
greater similarity. Our goal it to assign values to every vertex v £ V\T so that the values assigned are as smooth as 
possible across edges. A minimal Lipschitz extension of vq is a vector v that minimizes 

max (t(x, y)) -1 Itt(ir) - v(y)\ , (1) 

(x,y)eE 

subject to v(x) = vq(x) for all x £ T. We call such a vector an inf-minimizer. Inf-minimizers are not unique. So, 
among inf-minimizers we seek vectors that minimize the second-largest absolute value of £(x,y)~ 1 jr’(x) — v(y)\ 
across edges, and then the third-largest given that, and so on. We call such a vector v a lex-minimizer. It is also known 
as an absolutely minimal Lipschitz extension of vq ■ 

These are the limit of the solution to p-Laplacian minimization problems for large p, namely the vectors that solve 

mil i X W x ’y))~ p \ v ( x )- y (y)\ p - ( 2 ) 

ueR" 

v\t=vo\t ( x >y)£E 

The use of p = 2 was suggested in the foundational paper of Zhu et al. (2003), and is particularly nice because it can 
be obtained by solving a system of linear equations in a symmetric diagonally dominant matrix, which can be done 
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very quickly (Cohen et al. (2014)). The use of larger values of p has been discussed by Alamgir and Luxburg (201 1), 
and by Bridle and Zhu (2013), but it is much more complicated to compute. The fastest algorithms we know for this 
problem require convex programming, and then require very high accuracy to obtain the values at most vertices. By 
taking the limit as p goes to infinity, we recover the lex-minimizer, which we will show can be computed quickly. 

The lex-minimization problem has a remarkable amount of structure. For example, in uniformly weighted graphs 
the value of the lex-minimizer at every vertex not in T is equal to the average of the minimum and maximum of the 
values at its neighbors. This is analogous to the property of the 2-Laplacian minimizer that the value at every vertex 
not in T equals the average of the values at its neighbors. 

1.1 Contributions 

We first present several important structural properties of lex-minimizers in Section 3.2. As we shall point out, some 
of these were known from previous work, sometimes in restricted settings. We state them generally and prove them 
for completeness. We also prove that the lex-minimizer is as stable as possible under perturbations of vq (Section 3.1). 

The structure of the lex-minimization problem has led us to develop elegant algorithms for its solution. Both the 
algorithms and their analyses could be taught to undergraduates. We believe that these algorithms could be used in 
place of 2-Laplacian minimization in many applications. 

We present algorithms for the following problems. Throughout, m = \E\ and n = \V\. 

Inf-minimization: An algorithm that runs in expected time 0{m + n log n) (Section 4.3). 

Lex-minimization: An algorithm that runs in expected time 0(n(m + n log n)) (Section 4), along with a variant that 
runs quickly in practice (Section 4.4). 

l \-regularization of edge lengths for inf-minimization: The problem of minimizing ( 1) given a limited budget with 
which one can increase edge lengths is a linear programming problem. We show how to solve it in time 0(m 3 / 2 ) 
with an interior point method by using fast Laplacian solvers (Section 8). The same algorithm can accommodate 
l\ -regularization of the values given in vq. 

/(j-regularization of vertex values for inf-minimization: We give a polynomial time algorithm for Zo-regularization 
of the values at vertices. That is, we minimize ( 1) given a budget of a number of vertices that can be proclaimed 
outliers and removed from T (Section 7.1). We solve this problem by reducing it to the problem of computing 
minimum vertex covers on transitively closed directed acyclic graphs, a special case of minimum vertex cover 
that can be solved in polynomial time. 

After any regularization for inf-minimization, we suggest computing the lex-minimizer. We find the result for Zo- 
regularization of vertex values to be particularly surprising, especially because we prove that the analogous problem 
for 2-Laplacian minimization is NP-Hard (Section 7.2). 

All of our algorithms extend naturally to directed graphs (Section 5). This is in contrast with the problem of 
minimizing 2-Laplacians on directed graphs, which corresponds to computing electrical flows in networks of resistors 
and diodes, for which fast algorithms are not presently known. 

We present a few experiments on examples demonstrating that the lex-minimizer can overcome known deficien¬ 
cies of the 2-Laplacian minimizer (Section 1.2, Figures 1,2), as well as a demonstration of the performance of the 
directed analog of our algorithms on the WebSpam dataset of Castillo et al. (2006) (Section 6). In the WebSpam prob¬ 
lem we use the link structure of a collection of web sites to flag some sites as spam, given a small number of labeled 
sites known to be spam or normal. 

1.2 Relation to Prior Work 

We first encountered the idea of using the minimizer of the 2-Laplacian given by (2) for regression and classifica¬ 
tion on graphs in the work of Zhu et al. (2003) and Belkin et al. (2004) on semi-supervised learning. These works 
transformed learning problems on sets of vectors into problems on graphs by identifying vectors with vertices and 
constructing graphs with edges between nearby vectors. One shortcoming of this approach (see Nadler et al. (2009), 
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Figure 1: Lex vs 2-Laplacian on ID gaussian clus¬ 
ters. Figure 2: kNN graphs on samples from 4D cube. 


Alamgir and Luxburg (201 1), Bridle and Zhu (2013)) is that if the number of vectors grows while the number of la¬ 
beled vectors remains fixed, then almost all the values of the 2-Laplacian minimizer converge to the mean of the 
labels on most natural examples. For example, Nadler et al. (2009) consider sampling points from two Gaussian 
distributions centered at 0 and 4 on the real line. They place edges between every pair of points (x,y) with length 
exp(|a; — y | 2 /2<r 2 ) for er = 0.4, and provide only the labels i>o(0) = —1 and i;o(4) = 1. Figure 1 shows the values 
of the 2-Laplacian minimizer in red, which are all approximately zero. In contrast, the values of the lex-minimizer in 
blue, which are smoothly distributed between the labeled points, are shown. 

The “manifold hypothesis” (see Chapelle et al. (2010), Ma and Fu (201 1)) holds that much natural data lies near a 
low-dimensional manifold and that natural functions we would like to learn on this data are smooth functions on the 
manifold. Under this assumption, one should expect lex-minimizers to interpolate well. In contrast, the 2-Laplacian 
minimizers degrade (dotted lines) if the number of labeled points remains fixed while the total number of points grows. 
In Figure 2, we demonstrate this by sampling many points uniformly from the unit cube in 4 dimensions, form their 
8-nearest neighbor graph, and consider the problem of regressing the first coordinate. We performed 8 experiments, 
varying the number of labeled points in {50,100, 500,1000}. Each data point is the mean average l\ error over 100 
experiments. The plots for root mean squared error are similar. The standard deviation of the estimations of the mean 
are within one pixel, and so are not displayed. The performance of the lex-minimizer (solid lines) does not degrade as 
the number of unlabeled points grows. 

Analogous to our inf-minimizers, minimal Lipschitz extensions of functions in Euclidean space and over more 
general metric spaces have been studied extensively in Mathematics (Kirszbraun (1934), McShane (1934), Whitney 
(1934)). von Luxburg and Bousquet (2003) employ Lipschitz extensions on metric spaces for classification and relate 
these to Support Vector Machines. Their work inspired improvements in classification and regression in metric spaces 
with low doubling dimension (Gottlieb et al. (2013), Gottlieb et al. (2013b)). Theoretically fast, although not actually 
practical, algorithms have been given for constructing minimal Lipschitz extensions of functions on low-dimensional 
Euclidean spaces (Fefferman (2009a), Fefferman and Klartag (2009), Fefferman (2009b)). Sinop and Grady (2007) 
suggest using inf-minimizers for binary classification problems on graphs. For this special case, where all of the 
given values are either 0 or 1, they present an 0(m + n log n) time algorithm for computing an inf-minimizer. The 
case of general given values, which we solve in this paper, is much more complicated. To compensate for the non¬ 
uniqueness of inf-minimizers, they suggest choosing the inf-minimizer that minimizes (2) with p = 2. We believe that 
the lex-minimizer is a more natural choice. 

The analog of our lex-minimizer over continuous spaces is called the absolutely minimal Lipschitz extension 
(AMLE). Starting with the work of Aronsson (1967), there have been several characterizations and proofs of the ex¬ 
istence and uniqueness of the AMLE (Jensen (1993), Crandall et al. (2001), Barles and Busca (2001), Aronsson et al. 
(2004)). Many of these results were later extended to general metric spaces, including graphs (Milman (1999), 
Peres et al. (2011), Naor and Sheffield (2010), Sheffield and Smart (2010)). However, to the best of our knowledge, 
fast algorithms for computing lex-minimizers on graphs were not known. For the special case of undirected, un¬ 
weighted graphs, Lazarus et al. (1999) presented both a polynomial-time algorithm and an iterative method. Oberman 
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(2011) suggested computing the AMLE in Euclidean space by first discretizing the problem and then solving the cor¬ 
responding graph problem by an iterative method. However, no run-time guarantees were obtained for either iterative 
method. 


2 Notation and Basic Definitions 

Lexicographic Ordering. Given a vector r £ R m , let n r denote a permutation that sorts r in non-increasing order 
by absolute value, i.e., V* £ [to — 1], |r(7r r (*))| > \r(w r (i + 1))|. Given two vectors r, s £ R m , we write r A s to 
indicate that r is smaller than s in the lexicographic ordering on sorted absolute values, i.e. 

3 j £ [m], |r(7T r (j))| < |s(tt s (j))| andVz £ [j - 1], \r(n r (i))\ = |s(tt s (z))| 
or Vi £ [to], |r(7T r (z))| = |s(7r s (i))| . 

Note that it is possible that r -< s and s < r while r / s. It is a total relation: for every r and s at least one of r V s 
or s A r is true. 


Graphs and Matrices. We will work with weighted graphs. Unless explicitly stated, we will assume that they are 
undirected. For a graph G, we let Vq be its set of vertices, Eq be its set of edges, and ic '■ Eq —> R+ be the 
assignment of positive lengths to the edges. We let \Vg\ = n, and \Eq\ = to. We assume is symmetric, i.e., 
t? G (x, y) = t-dVi x )■ When G is clear from the context, we drop the subscript. 

A path P in G is an ordered sequence of (not necessarily distinct) vertices P = (xq,Xi, ... ,Xk), such that 
(xi-i,Xi) £ E for i £ [k]. The endpoints of P are denoted by doP = Xo,d±P = Xk- The set of interior vertices 
of P is defined to be int(P) = {xi : 0 < i < k}. For 0 < i < j < k, we use the notation P[xi : Xj] to denote the 
subpath (xi,..., Xj). The length of P is £(P) = £{ x i-h x i)- 

A function vq ■ V — >• R U {*} is called a voltage assignment (to G). A vertex x £ U is a terminal with 
respect to vq iff vq(x) ^ *. The other vertices, for which vtj(x) = *, are non-terminals. We let T(v o) denote the 
set of terminals with respect to vq. If T(v o) = V. we call vq a complete voltage assignment (to G). We say that an 
assignment w:k->lRU {*} extends vo if v(x) = Vq(x) for all x such that uo(a;) ^ *. 

Given an assignment vo '■ V —> R U {*}, and two terminals x, y £ T( vq) for which (x, y) £ E, we define the 
gradient on (x, y) due to vq to be 


grad G [v 0 ){x,y) = 


v 0 (x) - v 0 (y) 

t(x,y) 


It may be useful to view grad G [uo](a:, y) as the current in the edge (x,y) induced by voltages vq. When vq is a 
complete voltage assignment, we interpret grad G [i>o] as a vector in R m , with one entry for each edge. However, for 
convenience, we define grad G [no] (x, y) = —grad G [no] (y, x). When G is clear from the context, we drop the subscript. 

A graph G along with a voltage assignment v to G is called a partially-labeled graph, denoted (G,v). We say 
that a partially-labeled graph (G, vq) is a well-posed instance if for every maximal connected component II of G, we 
have T(vo) fl Vh 7 ^ 0- 

A path P in a partially-labeled graph (G, vq) is called a terminal path if both endpoints are terminals. We define 


VP(u 0 ) to be its gradient: 


VP(v 0 ) = 


v 0 (d 0 P) - void!P) 

m 


If P contains no terminal-terminal edges (and hence, contains at least one non-terminal), it is a free terminal path. 


Lex-Minimization. An instance of the Lex-Minimization problem is described by a partially-labeled graph 
(G, Vo). The objective is to compute a complete voltage assignment v : Vq —> R extending i.' 0 that lex-minimizes 
grad[u]. 

Definition 2.1 (Lex-minimizer) Given a partially-labeled graph (G, vq), we define lex G [vo] to be a complete voltage 
assignment to V that extends vo, and such that for every other complete assignment v' : Vq —> R that extends Vq, we 
have grad G [lex G [vo]] V grad G [v']. That is, lex G [vo] achieves a lexicographically-minimal gradient assignment to the 
edges. 

We call lex G [vo] the lex-minimizer for (G, vq). Note that if T(v 0 ) = Vq, then trivially, lex G [vo] = Vq- 
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3 Basic Properties of Lex-Minimizers 


Lazarus et al. (1999) established that lex-minimizers in unweighted and undirected graphs exist, are unique, and may 
be computed by an elementary meta-algorithm. We state and prove these facts for undirected weighted graphs, and 
defer the discussion of the directed case to Section 5. We also state for directed and weighted graphs characterizations 
of lex-minimizers that were established by Peres et al. (2011), Naor and Sheffield (2010) and Sheffield and Smart 
(2010) for unweighted graphs. These results are essential for the analyses of our algorithms. We defer most proofs to 
Appendix A. 

Definition 3.1 A steepest fixable path in an instance (G, Vo) is a free terminal path P that has the largest gradient 
VP(i>o) amongst such paths. 

Observe that a steepest fixable path with V P(vq) f 0 must be a simple path. 

Definition 3.2 Given a steepest fixable path P in an instance (G, vq), we define fix^? [vo, P] '■ Vg ->KU {*} to be the 
voltage assignment defined as follows 


fix G [u 0 ,P](a:) 


v 0 (d 0 P) - VP(uo) • £g(P[9oP ■ a;]) x £ int(P) \ T(v 0 ), 
Vo (x) otherwise. 


We say that the vertices x £ int(P) are fixed by the operation fix[uo,P]. If we define v\ = fix G [uo,P], where 
P = (xo, • • • ,x r ) is the steepest fixable path in (G,v o), then it is easy to argue that for every i £ [r], we have 
grad[tti](xj_i,Xj) = VP (see Lemma A. 5). The meta-algorithm Meta-Lex, spelled out as Algorithm 1, entails 
repeatedly fixing steepest fixable paths. While it is possible to have multiple steepest fixable paths, the result of fixing 
all of them does not depend on the order in which they are fixed. 

Theorem 3.3 Given a well-posed instance (G,v o), the meta-algorithm Meta-Lex, which repeatedly fixes steepest 
fixable paths, produces the unique lex-minimizer extending vq. 


Corollary 3.4 Given a well-posed instance (G, vq) such thatT{vf) f Vq, let P be a steepest fixable path in (G, Vq). 
Then, (G, fix[vo, P]) is also a well-posed instance, and lex G [fix[vo, P]] = lex G [uo]. 


Since a lex-minimal element must be an inf-minimizer, we also obtain the following corollary, that can also be 
proved using LP duality. 

Lemma 3.5 Suppose we have a well-posed instance {G,v o). Then, there exists a complete voltage assignment v 
extending vq such that ||grad[u] < a, iff every terminal path P in (G, vq) satisfies VP(« o) < ct. 


3.1 Stability 

The following theorem states that lex G [«o] is monotonic with respect to vq and it respects scaling and translation of 

Vo- 

Theorem 3.6 Let (G,vq) be a well-posed instance with T := T(v o) as the set of terminals. Then the following 
statements hold. 

1. For any c, d £ R , V\ a partial assignment with terminals T(v i ) = T and V\(t) = cvq (t) + d for all t £ T. 
Then, lex G [ui](«) = c • lex G [uo](«) + d for all i £ Vg- 

2. Vi a partial assignment with terminals T(v i) = T. Suppose further that Vi(t) > Vo(t) for all t £ T. Then, 
lex G [ui](«) > I ex G [v 0 ](i) for all i £ V G - 

As a corollary, the above theorem gives a nice stability property that lex-minimal elements satisfy. 

Corollary 3.7 Given well-posed instances (G, vo), (G, v\) such that T := T(v o) = T(v i), let e := ma ~x.teT |uo(t) — 
Vi(t)\. Then |lex G [uo](«) — lex G [ui](i)| < e for all i £ Vg- 
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3.2 Alternate Characterizations 

There are at least two other seemingly disparate definitions that are equivalent to lex-minimal voltages. 

Zp-norm Minimizers. As mentioned in the introduction, for a well-posed instance (G, vo) the lex-minimizer is also 
the limit of l p minimizers. This follows from existing results about the limit of Z p -minimizers (Egger and Huotari 
(1990)) in affine spaces, since {grad[t?] | v is complete, v extends uo} forms an affine subspace of R m . Thus, we have 
the following theorem: 

Theorem 3.8 (Limit of Zp-minimizers, follows from Egger and Huotari (1990)) For any p £ (1, oo), given a well- 
posed instance (G, vq) define v p to be the unique complete voltage assignment extending vo and minimizing ||grad[u] || , 

i.e. 

v p = argmin ||grad[u]|| . 

v is complete ^ 

v extends 

Then, limp^oo v p = lex G [u 0 ]. 

Max-Min Gradient Averaging. Consider a well-posed instance (G, vq), and a complete voltage assignment v ex¬ 
tending uo-IfG is such that £(e) = 1 for all e £ Eg, it is easy to see that lex = lex G [uo] satisfies the following simple 
condition for all x £ Vq \ T(vq), 


lex(x) 


1 

2 


( max lex(w) -f min lex(z) 

(x,y)eE G ( x,z)GE G 


This condition should be contrasted to the optimality condition for Z 2 -regularization on these instances, which gives 
for all non-terminals x, the optimal voltage v satisfies v(x) = deg ^ J2 y -( x y)eE a v (y)- 

To prove the above claim, consider locally changing lex at x and observe that the gradients of edges not incident 
at x remain unchanged, and at least one of edges incident at x will have a strictly larger gradient, contradicting lex- 
minimality. For general graphs, this condition of local optimality can still be characterized by a simple max-min 
gradient averaging property as described below. 


Definition 3.9 (Max-Min Gradient Averaging) Given a well-posed instance (G, vq), and a complete voltage as¬ 
signment v extending vo, we say that v satisfies the max-min gradient averaging property (w.r.t. (G, vq)) if for every 
x £ Vg \ T(vq), we have 


max grad[u](:r, y) = — min grad[u](a:, y). 

y:(x,y)£E G y:(x,y)£E G 


As stated in the theorem below, lex G [uo] is the unique assignment satisfying max-min gradient averaging property. 
Sheffield and Smart (2010) proved a variant of this statement for weighted graphs. For completeness, we present a 
proof in the appendix. 


Theorem 3.10 Given a well-posed instance (G, vo), lex G [uo] satisfies max-min gradient averaging property. More¬ 
over, it is the unique complete voltage assignment extending vq that satisfies this property w.r.t. (G, vq). 


An advantage of this characterization is that it can be verified quickly. This is particularly useful for implementations 
for computing the lex-minimizer. 


4 Algorithms 

We now sketch the ideas behind our algorithms and give precise statements of our results. A full description of all the 
algorithms is included in the appendix. 

We define the pressure of a vertex to be the gradient of the steepest terminal path through it: 

pressure[uo](x) = max{VP(i>o) | P is a terminal path in (G, uq) and x £ P}. 
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Observe that in a graph with no terminal-terminal edges, a free terminal path is a steepest fixable path iff its gradient 
is equal to the highest pressure amongst all vertices. Moreover, vertices that lie on steepest fixable paths are exactly 
the vertices with the highest pressure. For a given a > 0, in order to identify vertices with pressure exceeding a, we 
compute vectors vHigh[a](a;) and vLow[a](a;) defined as follows in terms of dist, the metric on V induced by £: 

vLow[a](:r) = min (u 0 (£) + a ■ dist(x, £)} vHigh[a](:r) = max {u 0 (£) — a ■ dist(£, x)}. 
teT(«o) ter(« 0 ) 

4.1 Lex-minimization on Star Graphs 

We first consider the problem of computing the lex-minimizer on a star graph in which every vertex but the center is a 
terminal. This special case is a subroutine in the general algorithm, and also motivates some of our techniques. 

Let x be the center vertex, T be the set of terminals, and all edges be of the form (x, t) with £ £ T. The initial 
voltage assignment is given by v '■ T —»• R, and we abbreviate dist(x, £) by d(t) = £(x, £). From Corollary 3.4 we know 
that we can determine the value of the lex minimizer at x by finding a steepest fixable path. By definition, we need to 
find t-[. t ‘2 £ T that maximize the gradient of the path from t\ to £ 2 , V(£i, £ 2 ) = dfel+d'fe) • As observed above, this 
is equivalent to finding a terminal with the highest pressure. We now present a simple randomized algorithm for this 
problem that runs in expected linear time. 

Given a terminal £ 1 , we can compute its pressure a along with the terminal £2 such that | V(£ 1 , £ 2 )| = a in time 
0(\T\) by scanning over the terminals in T. Consider doing this for a random terminal t\. We will show that in linear 
time one can then find the subset of terminals T' C T whose pressure is greater than a. Assuming this, we complete 
the analysis of the algorithm. If T' = 0, t \ is a vertex with highest pressure. Hence the path from t\ to £2 is a steepest 
fixable path, and we return (£ 1 , £ 2 ). If T' 7 ^ 0, the terminal with the highest pressure must be in T', and we recurse by 
picking a new random t\ £ T. As the size of T' will halve in expectation at each iteration, the expected time of the 
algorithm on the star is 0(|T|). 

To determine which terminals have pressure exceeding a, we observe that the condition 3 £2 : a < V(£i,£ 2 ) = 
d [71)+d( 12) ’' s e quivalentto =^2 : vft^+adfo) < w(£i) — ad(£i). This, in turn, is equivalent to vLow[ct](j;) < v(ti) — 
ad(£i). We can compute vLow[ct](a;) in deterministic 0(|T|) time. Similarly, we can check if ^£2 : a < V(£ 2 , £ 1 ) by 
checking if vHigh[ct](a;) > vt x + ad(£i). Thus, in linear time, we can compute the set T' of terminals with pressure 
exceeding a. The above algorithm is described in Algorithm 10. 

Theorem 4.1 Given a set of terminals T, initial voltages v : T —>• R, and distances d : T —> R + , StarSteepestPath(T, v, d) 
returns {t\, £ 2 ) maximizing d(*i)+d'(*2) 1 an< ^ mns ex P ecte d time 0(|T|). 

4.2 Lex-minimization on General Graphs 

Theorem 3.3, tells us that Meta-Lex will compute lex-minimizers given an algorithm for finding a steepest fixable 
path in (G,v 0 ). Recall that finding a steepest fixable path is equivalent to finding a path with gradient equal to the 
highest pressure amongst all vertices. In this section, we show how to do this in expected time 0(m + n log n). 

We describe an algorithm VertexSteepestPath that finds a terminal path P through any vertex x such that 
VP(v 0 ) = pressure[i>o](a;) in expected 0(m + nlogn) time. Using Dijkstra’s algorithm, we compute dist(x,£) for 
all £ £ T. If x £ T[v 0 ), then there must be a terminal path P that starts at x that has VP(v 0 ) = pressure[wo](^)- To 
compute such a P we examine all £ £ T(v 0 ) in 0(|T|) time to find the £ that maximizes |V(x, £)| = , and 

then return a shortest path between x and that £. 

If x £ T(v 0 ), then the steepest path through x between terminals £1 and £2 must consist of shortest paths between 
x and £1 and between x and £ 2 . Thus, we can reduce the problem to that of finding the steepest path in a star graph 
where x is the only non-terminal and is connected to each terminal £ by an edge of length d ist(ar, £). By Theorem 4.1, 
we can find this steepest path in 0(|T|) expected time. The above algorithm is formally described as Algorithm 9. 

Theorem 4.2 Given a well-posed instance ( G , vq), and a vertex x £ Vc■ VertexSteepestPath(G, vo, x) returns 
a terminal path P through x such that VP{vq) = pressure^] (a;), in 0(m + n logn) expected time. 
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As in the algorithm for the star graph, we need to identify the vertices whose pressure exceeds a given a. For a fixed 
a, we can compute vLow[a](a;) and vHigh[a](:r) for all x £ Vq using a simple modification of Dijkstra’s algorithm in 
0(m + n log n ) time. We describe the algorithms CompVHigh, CompVLow for these tasks in Algorithms 3 and 4. 
The following lemma encapsulates the usefulness of vLow and vHigh. 

Lemma 4.3 For every x € Vg, pressure[fo](a;) > a iff vHigh[a](a;) > vLow[a](x). 

It immediately follows that the algorithm CompHighPressGraph(G, vq , a) described in Algorithm 6 computes 
the vertex induced subgraph on the vertex set {x £ Vg\ pressure[t;o](at) > a}. 

We can combine these algorithms into an algorithm SteepestPath that finds the steepest fixable path in (G, vq) 
in 0(m + n log n) expected time. We may assume that there are no terminal-terminal edges in G. We sample an edge 
(,x'i, x 2 ) uniformly at random from Eq, and a terminal x :i uniformly at random from Vq- For i 1,2,3, we compute 
the steepest terminal path Pj containing x, ; . By Theorem 4.2, this can be done in 0(m + n log n) expected time. Let a 
be the largest gradient max, V I\. As mentioned above, we can identify G' , the induced subgraph on vertices x with 
pressure exceeding a, in 0{m + n log n) time. If G' is empty, we know that the path P, with largest gradient is a 
steepest fixable path. If not, a steepest fixable path in (G, vq) must be in G' , and hence we can recurse on G' . Since 
we picked a uniformly random edge, and a uniformly random vertex, the expected size of G' is at most half that of G. 
Thus, we obtain an expected running time of 0{m + n log n). This algorithm is described in detail in Algorithm 7. 

Theorem 4.4 Given a well-posed instance (G,v o) with Eq H (T(v o) x T(v o)) = 0, SteepestPath(G, vo) returns 
a steepest fixable path in (G, vq), and runs in 0(m + nlogn) expected time. 

By using SteepestPath in Meta-Lex, we get the CompLexMin, shown in Algorithm 1. From Theorem 3.3 and 
Theorem 4.4, we immediately get the following corollary. 

Corollary 4.5 Given a well-posed instance (G,v o) as input, algorithm CompLexMin computes a lex-minimizing 
assignment that extends vq in 0(n(m + n log n)) expected time. 

4.3 Linear-time Algorithm for Inf-minimization 

Given the algorithms in the previous section, it is straightforward to construct an infinity minimizer. Let a* be the 
gradient of the steepest terminal path. From Lemma 3.5, we know that the norm of the inf minimizer is a*. Considering 
all trivial terminal paths (terminal-terminal edges), and using SteepestPath, we can compute a* in randomized 
0(m+n log n) time. It is well known (McShane (1934); Whitney (1934)) that v\ = vLow[a*] and V 2 = vHighfa*] are 
inf-minimizers. It is also known that \{v-\ + ) is the inf-minimizer that minimizes the maximum /.^-norm distance 

to all inf-minimizers. In the case of path graphs, this was observed by Gaffney and Powell (1976) and independently 
by Micchelli et al. (1976). For completeness, the algorithm is presented as Algorithm 5, and we have the following 
result. 

Theorem 4.6 Given a well-posed instance (G,i>o), CompInfMin(G, vo) returns a complete voltage assignment v 
for G extending vq that minimizes ||grad[u]|| , and runs in randomized 0(m + nlogn) time. 

4.4 Faster Algorithms for Lex-minimization 

The lex-minimizer has additional structure that allows one to compute it by more efficient algorithms. One observation 
that leads to a faster implementation is that fixing a steepest fixable path does not increase the pressure at vertices, 
provided that one appropriately ignores terminal-terminal edges. Thus, if G 1 '' 1 is a subgraph that we identified with 
pressure greater than a, we can iteratively fix all steepest fixable paths P in G' a ' 1 with VP > a. Another simple 
observation is that if Cr ly) is disconnected, we can simply recurse on each of the connected components. A complete 
description of an the algorithm CompFastLexMin based on these idea is given in Algorithm 11. The algorithm 
provably computes lexc^xo), and it is possible to implement it so that the space requirement is only 0{m + n). 
Although, we are unable to prove theoretical bounds on the running time that are better than 0(n(m + nlogn)), 
it runs extremely quickly in practice. We used it to perform the experiments in this paper. For random regular 
graphs and Delaunay graphs, with n = 0.5 x 10 6 vertices and around 2 million edges m ~ 1.5 — 2 x 10 6 , it 


takes a couple of minutes on a 2009 MacBook Pro. Similar times are observed for other model graphs of this 
size such as random regular graphs and real world networks. An implementation of this algorithm may be found 

at https://github.com/danspielman/YINSlex. 


5 Directed Graphs 

Our definitions and algorithms, including those for regularization, extend to directed graphs with only small modifi¬ 
cations. We view directed edges as diodes and only consider potential differences in the direction of the edge. For 
a complete voltage assignment v on the vertices of a directed graph G, we define the directed gradient on (x. y) due 
to v to be gradg [?;](£, y ) = max j ■ 0 j ■ Given a partially-labelled directed graph (G, vq), we say that a a 

complete voltage assignment v is a lex-minimizer if it extends vq and for other complete voltage assignment v' that 
extends vo we have grader;] A grad|fc[t/]. We say that a partially-labelled directed graph (G,vq) is a well-posed 
directed instance if every free vertex appears in a directed path between two terminals. 

The main difference between the directed and undirected cases is that the directed lex-minimizer is not necessarily 
unique. To maintain clarity of exposition, we chose to focus on undirected graphs so far. For directed graphs, we have 
the following corresponding structural results. 

Theorem 5.1 Given a well-posed instance (G, vq) on a directed graph G, there exists a lex-minimizer, and the set of 
all lex-minimizers is a convex set. Moreover, for every two lex-minimizers v and v', we have gradg,[V] = grader/]. 

However, note that in the case of directed graphs, the lex-minimizer need not be unique. We still have a weaker version 
of Theorem 3.3 for directed graphs. 

Theorem 5.2 Given a well-posed instance (G,v 0 ) on a directed graph G, let v\ be the partial voltage assignment 
extending Vq obtained by repeatedly fixing steepest fixable (directed) paths P with VP > 0. Then, any lex-minimizer 
o/(G, vq) must extend v\. Moreover, for every edge e £ Eg \ (T(Vi) x any lex-minimizer v o/(G, vq) must 

satisfy grad + [w](e) = 0. 

When the value of the lex-minimizer at a vertex is not uniquely determined, it is constrained to an interval. In our 
experiments, we pick the convention that when the voltage at a vertex is constrained to an interval (—oo, a] or [a, oo), 
we assign a to the terminal. When it is constrained to a finite interval, we assign a voltage closest to the median of the 
original voltages. 

6 Experiments on WebSpam 

We demonstrate the performance of our lex-minimization algorithms on directed graphs by using them to detect spam 
webpages as in Zhou et al. (2007). We use the dataset webspam-uk2 00 6-2.0 described in Castillo et al. (2006). 
This collection includes 11,402 hosts, out of which 7,473 (65.5 %) are labeled, either as spam or normal. Each host 
corresponds to the collection of web pages it serves. Of the hosts, 1924 are labeled spam (25.7 % of all labels). We 
consider the problem of flagging some hosts as spam, given only a small fraction of the labels for training. We assign 
a value of 1 to the spam hosts, and a value of 0 to the normal ones. We then compute a lex minimizer and examine the 
effect of flagging as spam all hosts with a value greater than some threshold. 

Following Zhou et al. (2007), we create edges between hosts with lengths equal to the reciprocal of the number of 
links from one to the other. We run our experiments only on the largest strongly connected component of the graph, 
which contains 7945 hosts of which 5552 are labeled. 16 % of the nodes in this subgraph are labeled spam. To create 
training and test data, for a given value p, we select a random subset of p % of the spam labels and a random subset 
of p % of the normal labels to use for training. The remaining labels are used for testing. We report results for p = 5 
and p = 20 . 

Again following Zhou et al. (2007), we plot the precision and recall of different choices of threshold for flagging 
pages as spam. Recall is the fraction of spam pages our algorithm flags as spam, and precision is the fraction of pages 
our algorithm flags as spam that actually are spam. Amongst the algorithms studied by Zhou et al. (2007), the top 
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performer was their algorithm based on sampling according to a random-walk that follows in-links from other hosts. 
We compare their algorithm with the classification we get by directing edges in the opposite directions of links. This 
has the effect that a link to a spam host is evidence of spamminess, and a link from a normal host is evidence of 
normality. 

Results are shown in Figure 3. While we are not able to reliably flag all spam hosts, we see that in the range of 
10-50 % recall, we are able to flag spam with precision above 82 %. We see that the performance of directed lex- 
minimization does not degrade rapidly when from the “large training set” regime of p = 20, to the “small training set” 
regime of p = 5. 


5 % labels for training 



Recall 


20 % labels for training 



Recall 


Figure 3: Recall and precision in the web spam classification experiment. Each data point shown was computed as an average over 
100 runs. The largest standard deviation of the mean precision across the plotted recall values was less than 1.3 %. The algorithm 
of Zhou et al. (2007) appears as RandWalk. Our directed lex-minimization algorithm appears as DirectedLex. 


For comparison, in Appendix C, we show the performance of our algorithm and that of Zhou et al. (2007) both 
with link directions reversed, as well as the performance of undirected lex-minimization and Laplacian inference, all 
of which are significantly worse. 


7 / 0 -Regularization of Vertex Values 

We now explain how we can accommodate noise in both the given voltages and in the given lengths of edges. We can 
find the minimum number of labels to ignore, or the minimum increase in edges lengths needed so that there exists an 
extension whose gradients have /^-norm lower than a given target. After determining which labels to ignore or the 
needed increment in edge lengths, we recommend computing a lex minimizer. 

The algorithms we present in this section are essentially the same for directed and undirected graphs. 

7.1 /o-Vertex Regularization for Inf-minimization 

The Zo-regularization of vertex labels can be viewed as a problem of outlier removal: the vector we compute is allowed 
to disagree with vq on up to k terminals. Given a voltage assignment v and a subset T C R of the vertices, by v(T) 
we mean the vector obtained by restricting v to T. We define the l 0 -Vertex Regularization for 1^ problem to be 

min llgradg.fr;] II subject to ||t;(T) - v 0 (T) || < k, (3) 

where v(T) is the vector of values of v on the terminals T. 

In Appendix D, we describe an approximation algorithm Approx-Outlier that approximately solves program (3). 
The precise statement we prove in Appendix D is given in the following theorem. 
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Theorem 7.1 (Approximate /y- vertex regularization) The algorithm Approx-Outlier takes a positive integer k 
and a partially-labeled graph (G,v o), and outputs an assignment v with ||'i;(T) — uo(T)|| 0 < 2k, and 11 grad^ [z;] || < 

a*, where a* is the optimum value of program (3). The algorithm runs in time 0(k(m + n log n)). 

In Appendix D, we also describe an algorithm OUTLIER that exactly solves program (3) in polynomial time, and we 
prove its correctness. 

Theorem 7.2 (Exact Zo-vertex regularization) The algorithm OUTLIER takes a positive integer k and a partially- 
labeled graph (G, Vo) solves program (3) exactly. The algorithm runs in polynomial time. 

We give a proof of Theorem 7.2 in Appendix D. To do this, we reduce the program (3) to the problem of minimizing 
the required Zo-budget needed to achieve a fixed gradient a using a binary search over a set of 0(n 2 ) gradients. This 
latter problem we reduce in polynomial time to Minimum Vertex Cover (VC) on a transitively closed, directed acyclic 
graph (a TC-DAG). VC on a TC-DAG can be solved exactly in polynomial time by a reduction to the Maximum 
Bipartite Matching Problem (Fulkerson (1956)). The problem was phrased by Fulkerson as one of finding a maximum 
antichain of a finite poset. Any transitively closed DAG corresponds directly to the comparability graph of a poset. A 
maximum antichain of a poset is a maximum independent set of a the comparability graph of the poset, and hence its 
complement is a minimum vertex cover of the comparability graph. We refer to the algorithm developed by Fulkerson 
as Konig-Cover. 

Theorem 7.3 The algorithm Konig-Cover computes a minimum vertex cover for any transitively closed DAG G in 
polynomial time. 

7.2 Hardness of l 0 regularization for l 2 

The result that Zo-regularized inf-minimization can be solved exactly in polynomial time is surprising, especially 
because the analogous problem for 2-Laplacian minimization turns out to be NP-Hard. 

We define the the Zo vertex regularization for l 2 for a partially-labeled graph (G, vq) and an integer k by 

min v T Lv, 

veK”-.\\v(T)-v 0 (T)\\ o <k 


where L is the Laplacian of G. 

Theorem 7.4 Zo vertex regularization for l 2 is NP-Hard. 

In Appendix E we prove Theorem 7.4 by giving a polynomial time (Karp) reduction from the NP-Hard minimum 
bisection problem to Iq vertex regularization for l 2 . 

8 /i-Edge and Vertex Regularization of Inf-minimizers 

Consider a partially-labeled graph (G, vq) and an a > 0. The set of voltage assignments given by 

ju : v extends vq and ||grad G [u]|| oo < ctj 

is convex. Going further, let us consider the edge lengths in a graph to be specified by a vector t € II ! /; . Now the set 
of voltages v and and lengths l which achieve ||grad G (£) [u] < a is jointly convex in v and l. To see this, observe 

that 

Hgradc^^Hoo < a V(u, v) g E : —al(u,v)<v{u) — v{v)<ai{u,v). (4) 

Furthermore, the condition “v extends vf’ is a linear constraint on v, which we express as v(T) = vq(T). From 
the above, it is clear that the gradient condition corresponds to a convex set, as it is an intersection of half-spaces. 
These half-spaces are given by 0(m) linear inequalities. We can leverage this to phrase many regularized variants of 
inf-minimization as convex programs, and in some cases linear programs. 
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For example, we may consider a variant of inf-minimization combined with an li -budget for changing lengths of 
edges and values on terminals. Given a parameter 7 > 0 which specifies the relative cost of regularizing terminals to 
regularizing edges, the problem is as follows 


argmin IMIr + 7 \\v(T) - u 0 (T)|| 

v€R n ,s€R m ,s>0 


subject to 


g rad G(.«+ S )M 


< a. 

00 


(5) 


From our observation (4), it follows that problem (5) may be expressed as a linear program with 0(n) variables 
and 0{m) constraints. We can use ideas from Daitch and Spielman (2008) to solve the resulting linear program in 
time ()(m [ Jy ) by an interior point method with a special purpose linear equation solver. The reason is that the linear 
equations the IPM must solve at each iteration may be reduced to linear equations in symmetric, diagonally dominant 
matrices, and these may be solved in nearly-linear time (Cohen et al. (2014)). 


Conclusion. We propose the use of inf and lex minimizers for regression on graphs. We present simple algorithms 
for computing them that are provably fast and correct, and can also be implemented efficiently. We also present a 
framework and polynomial time algorithms for regularization in this setting. The initial experiments reported in the 
paper indicate that these algorithms give pretty good results on real and synthetic datasets. The results seem to compare 
quite favorably to other algorithms, particularly in the regime of tiny labeled sets. We are testing these algorithms on 
several other graph learning questions, and plan to report on them in a forthcoming experimental paper. We believe 
that inf and lex minimizers, and the associated ideas presented in the paper, should be useful primitives that can be 
profitably combined with other approaches to learning on graphs. 
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A Basic Properties of Lex-Minimizers 

A.l Meta Algorithm 


Algorithm 1: Algorithm Meta-Lex: Given a well-posed instance (G, vo), outputs lexG[vo]. 
for i = 1,2 ,... : 

1. if T{vi_i) = Vq, then return . 

2. E' = E g \ x T(vi-i)), G' := (V G ,E f ). 

3. Let P* be a steepest fixable path in ( G', Vi-±). Let a* ■£- VP*(»j_i). 

4. Vi £- f\x[vi-i,Pf]. 

In this subsection, we prove the results that appeared in section 2. We start with a simple observation. 

Proposition A.l Given a well-posed instance (G,vq) such that T(vq) ^ V, let P be a steepest fixable path in (G,v o). 
Then, fixfitg, P] extends Vo, and ( G , fix[i> 0 , P]) is also a well-posed instance. 

The properties we prove below do not depend on the choice of the steepest fixable path. 

Proposition A.2 For any well-posed instance (G,v o), with \V G \ = n, Meta-Lex(G, vo) terminates in at most n 
iterations, and outputs a complete voltage assignment v that extends vq- 

Proof of Proposition A.2: By Proposition A.l, at any iteration i, Vi-\ extends vq and (G', u.,_ i) is a well-posed 
instance. Meta-Lex only outputs Vi-i iff T(vi- 1 ) = V, which means Vi-\ is a complete voltage assignment. For 
any Vi-i that is not complete, for any x £ V\T(vi-i ), we must have a free terminal path in (G'. Vi-{) that contains x. 
Hence, a steepest fixable path P* exists in (G', Vi- 1 ). Since P* is a free terminal path, fix, P*\ fixes the voltage 
for at least one non-terminal. Thus, Meta-Lex(G, vq) must complete in at most n iterations. □ 

For the following lemmas, consider a run of Meta-Lex with well-posed instance (G, ito) as input. Let i; out be the 
complete voltage assignment output by Meta-Lex. Let E, be the set of edges E' and Gi be the graph G' constructed 
in iteration i of Meta-Lex. 

Lemma A.3 For every edge e £ Pi-i \ Ei, we have |grad[t; ou t](e)| < a*. Moreover, a* is non-increasing with i. 

Proof of Lemma A.3: Let P* = (xq ,..., x r ) be a steepest fixable path in iteration i (when we deal with instance 
(Gi_i, Vi-i)). Consider a terminal path P i+ 1 in ( Gi,Vi ) such that {9oPi+i, d\Pi+i} H (T(ttj) \ T(v i_i)) ^ 0. We 
claim that VP,+i(t;i) < a*. On the contrary, assume that VPi+i(r;j) > a*. Consider the case c?oPi+i G T(vi) \ 
T{vi_i), d\P\ £ T{vi_i). By the definition of , we must have 9 0 Pi+i = Xj for some j £ [r — 1]. Let P' +1 be the 
path formed by joining paths P* \xq '■ Xj] andP i+1 . PL 1 is a free terminal path in (Gi_i,t)j_i). We have, 

Vi-i(x 0 ) - Vi-i(d\Pi + i) = (vi(x 0 ) - Vi(xj )) + (vi(d 0 P i+1 ) - Vi(diP i+ i )) 

> a* • e(P*[xo : Xj ]) + a * • £(P i+ 1 ) = a* • i{P' i+1 \ 

giving VP{ +1 (vi) > a*, which is a contradiction since the steepest fixable path P* in (Gj_i, Vi-i) has gradient a*. 
The other cases can be handled similarly. 

Applying the above claim to an edge e £ Pi_i \ P,, whose gradient is fixed for the first time in iteration i. we 
obtain that grad[uj+i](e) < a*. If v is the complete voltage assignment output by Meta-Lex, since v extends Vi+ 1 , 
we get grad[r; out ](e) < a*. Applying the claim to the symmetric edge, we obtain — grad[w 0 ut](e) < a *, implying 
|grad[u out ](e)| < a*. 

Consider any free terminal path Pi+i in (G,;,If P,+\ is also a terminal path in (G,_i, v,-i ), it is a free 
terminal path in (Gj_i, fi-i). In addition, since a steepest fixable path P* in (Gi-i,Vi-i) has VP* = a*, we get 
S7Pi + \(vi ) = VPi + i(vi-i) < a*. Otherwise, we must have {doPi + \, d\Pi + i} (T ( T(vt ) \ T(vi- 1 )) ^ 0, and we can 
deduce VP^+i ( Vi ) < a * using the above claim. Thus, all free terminal paths Pi+\ in ( Gi,Vi ) satisfy VP^+i ( Vi ) < a*. 
In particular, a* +1 = VP* +1 (vi) < a*. Thus, a * is non-increasing with i. □ 
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Lemma A.4 For any complete voltage assignment vfor G that extends Vo, if v ^ u ou tj we have grad [-u] f. grad[u ou t], 
and hence grad[w out ] P grad[u]. 

Proof of Lemma A.4: Consider any complete voltage assignment v for G that extends v u , such that v f v out . Thus, 
there exists a unique i such that v extends but does not extend v,j. We will argue that grad[u] grad[u out ], and 
hence grad[v ou t] ^ grad[i?]. For every edge e £ E \ Ei _i that has been fixed so far, grad[u](e) = grad[vi_i](e) = 
grad[v out ](e), and hence we can ignore these edges. 

Since v extends Vi -1 but not i>j, there exists an x € T{vf) \ T{vi-\) such that v(x) Vi(x) = x ou t(x)- Assume 
v(x ) < Vi(x) (the other case is symmetric). If P* = (xo, ■ ■ ■, x r ) is the steepest fixable path with gradient a* picked 
in iteration i, we must have x = Xj for some j £ [r — 1]. Thus, 

j i 

^2(v(xk-i) - v(x k )) = v(x 0 ) - v(xj) > Vi(x 0 ) - Vi(xj) = a* ■ £(P*[x 0 : xfi) = a * • l(x fc _i, x fc ). 

k-1 k-1 

Thus, for some k £ [j], we must have grad[v](xfc_i, Xk) > a*. Since P* is a path in Gi- 1 , we have {xk-i,x k } 2 
T(vi- 1 ). This gives (xk-i,x k ) £ (£i_i \ Ef). But then, from Lemma A. 3, it follows that for all e £ (2£j_i \ Ef), we 
have |grad[u ou t](e)| < a*. Thus, we have grad[u] ^ grad[u out ]. □ 

Lemma A.5 Let P = ( xo, ■ ■ ■, x r ) be a steepest fixable path such that it does not have any edges in T(v o) x T(v o) 
and v i = fix< 5 [uo, P). Then for every i £ [r], we have grad[ui](a;i_i, xf) = VP. 

Proof of Lemma A.5: Suppose this is not true and let j £ [r] be the minimum number such that grad \vf\{xj-\, Xj) f 
VP. By definition of v\ we would necessarily have j < r and Vj £ T(v o). Suppose grad[vi](xj-i,Xj) < VP. We 
would then have vi(xo) — t’i (xj ) < VP * £(P[x o : x^]). Since P does not have any edges in T(v o) x T(v o), 
Pi := (xj ...., x r ) would be a free terminal path with VPi > VP. This is a contradiction. Other cases can be ruled 
out similarly. 

□ 

Proof of Theorem 3.3: Consider an arbitrary run of Meta-Lex on (G, uo). Let u OJ t be the complete voltage 
assignment output by Meta-Lex. Proposition A.l implies that u ou t extends vq. Lemma A.4 implies that for any 
complete voltage assignment v u ou t that extends vo, we have grad[u 0 ut] f grad[i/]. Thus, u ou t is a lex-minimizer. 
Moreover, the lemma also gives that for any such v, grad [z;] ^ grad[u out ]- an d hence u out is a unique lex-minimizer. 
Thus, u ou t is the unique voltage assignment satisfying Def. 2.1, and we denote it as lexc[uo]. Since we started with an 
arbitrary run of Meta-Lex, uniqueness implies that every run of Meta-Lex on (G, vq) must output lex^ [z?o] • LI 

Proof of Lemma 3.5: Suppose we have a complete voltage assignment v extending vq, such that ||grad[u] || < a. 

For any terminal path P = (xo ,..., x r ), we get, 

r r 

VP(uo) = v 0 (doP) - vo(diP) = v(d 0 P) - v(diP) = ^grad [v\(xi-\,Xi) < a ■ ^l{xj-\,Xj) = a ■ £(P), 

i—1 i—1 

giving VP(uo) < cr- 

On the other hand, suppose every terminal path P in (G, vq) satisfies VP(uo) < a. Consider v = lexc^o]- We 
know that v extends vo. For every edge e £ Eq LI T(v o) x T(v o), e is a (trivial) terminal path in (G, Vo), and hence 
has satisfies grad[u](e) = grad[t>o](e) = Ve(r>o) < a. Considering the reverse edge, we also obtain —grad[v] (e) < a. 
Thus, |grad[v](e)| < a. Moreover, using Lemma A. 3, we know that for edge e £ Eg\T(v o) xT(v o), |grad[u](e)| < 
a\ = VP* < a since Pl is a terminal path in (G,vq). Thus, for every e £ Eg, |grad[u](e)| < a, and hence 
11 grad [v] 11 ^ < a. □ 

A.2 Stability 

In this subsection, we sketch a proof of the monotonicity of lex-minimizers and show how it implies the stability 
property claimed earlier. 

For any well-posed (G, vq), there could be several possible executions of Meta-Lex, each characterized by the 
sequence of paths P*. We can apply Theorem 3.3 to deduce the following structural result about the lex-minimizer. 
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Corollary A.6 For any well-posed instance ( G , vo), consider a sequence of paths (Pi,..., P r ) and voltage assign¬ 
ments (vi ,..., v r ) for some positive integer r such that: 

1. P* is a steepest fixable path in (Gj_i, Vi- 1 ) for i = 1,..., r. 

2 . Vi = Y\x[vi--i,Pf]for i = 1,... ,r. 

3. T(v r ) = V G . 

Then, we have v r = lexc[uo]. 

We call such a sequence of paths and voltages to be a decomposition of lexc^o]. Again, note that lexc[uo] can 
possibly have multiple decompositions. However, any two such decompositions are consistent in the sense that they 
produce the same voltage assignment. 

Proof of Corollary 3.7: We first define some operations on partial assignments which simplifies the notation. Let 
vo, Vi be any two partial assignments with the same set of terminals T := T{yo) = T(v i) and c, d £ R. By cvq + d 
we mean a partial assignment v with T(v) = T satisfying v(t) = cvo(t) + d for all t £ T. Also, by vo + vi we 
mean a partial assignment v with T[v) = T satisfying v(t) = vo(t) + vi(t) for all t £ T. Also, we say v\ > vo if 
vi (t) > vq (t) for all t £ T. 

Now we can show how Corollary 3.7 follows from Theorem 3.6. Let v := v\ — vq, and ||v|| = e, for some e > 0. 

Therefore, vq + e > vi > vo — e. Theorem 3.6 then implies that lex< 3 [uo] + e > lex[ui] > lex[uo] — e, hence proving 
the corollary. □ 

Proof sketch of Theorem 3.6: It is easy to see that the first statement holds. For the second statement, we first 
observe that if there is a sequence of paths Pi,..., P r that is simultaneously a decomposition of both lex[uo] and 
lex[ui], then this is easy to see. If such a path sequence doesn’t exist, then we look at vt '■= vo + t(v i — vo). We 
state here without a proof (though the proof is elementary) that we can then split the interval [0,1] into finitely many 
subintervals [ao, ai], [ai, 02 ],.., [afc_i, a*,], with ao = 0, a,k = 1, such that for any i, there is a path sequence Pi,..., P r 
which is a decomposition of lex[u t ] for all t £ [a*, Oj+i]. We then observe that vo = v ao < v ai < ...u nfc = vi. Since 
for every a,, Oj+i, there is a path sequence which is simultaneously a decomposition of both lex[u 0i ] and lex[u 0i+1 ], 
we immediately get 

lex[u 0 ] = lex[u ao ] < lex[u ai ] < ... < lex[u afc ] = lex[ui]. 

□ 


A.3 Alternate Characterizations 

Proof of Theorem 3.10: We know that lexcfuo] extends vq . We first prove that v = lexc[uo] satisfies the max-min 
gradient averaging property. Assume to the contrary. Thus, there exists x £ Vg\T(v 0 ) such that 

max grad \v\{x,y) ^ — min grad [v\(x,y). 
y:(x,y)eE G y.(x,y)£E G 

Assume that ma ^-t x , y )eE G grad[u](x, y) > — min ( x , y )£E G g ra d[u](x, y). Then, consider v' extending vo that is iden¬ 
tical to v except for v'(x) = v(x) — e for e > 0. For e small enough, we get that 

max grad[t/](x, y) < max grad [v\(x,y) 

y:(x,y)eE G y.(x,y)£E G 


and 

— min grad[u'](x, y) < max grad[u](x, y). 
y:(x,y)£E G V-(^,v)&E G 

The gradient of edges not incident on the vertex x is left unchanged. This implies that grad[u] f gradjp], 
contradicting the assumption that v is the lex-minimizer. (The other case is similar). 
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For the other direction. Consider a complete voltage assignment v extending vq that satisfies the max-min gradient 
averaging property w.r.t. (G, Xo). Let 

a = max grad [u] (a;, y) > 0 

(x,y)eE a 

xeV\T(v 0 ) 

be the maximum edge gradient, and consider any edge (xo,Xi) £ Eg such that grad[x](xi,xo) = a, with x\ £ 
V \ T(v o). If a = 0, grad[v] is identically zero, and is trivially the lex-minimal gradient assignment. Thus, both v and 
lexc^o] are constant on each connected component. Since (G, vq j is well-posed, there is at least one terminal in each 
component, and hence v and leX(;[t'o] must be identical. 

Now assume a > 0. By the max-min gradient averaging property, 3x2 £ Vg suc h that (xi, xa) £ Eg and 

grad[x](xi, X 2 ) = min grad[x](xi, y) = — max grad[x](xi ,y) 

V-(xi ,y)€E a y-(xi,y)£E G 

< -grad[u](xi,x 0 ) = -a. 

Thus, grad[n](x 2 ,Xi) > a. Since a is the maximum edge gradient, we must have grad[x](x 2 ,Xi) = a. More¬ 
over, v(x2) > v(x\) > v(xq), thus X2 ^ Xo- We can inductively apply this argument at X 2 until we hit a ter¬ 

minal. Similarly, if Xq (j T(y 0 ) we can extend the path in the other direction. Consequently, we obtain a path 
P = (xj,... ,X 2 , xi, Xo,x_i,..., Xfc) with all vertices as distinct, such that Xj, Xk £ T(v 0 ), and x* £ V \ T(vq) 
for all i £ [j + 1, k — 1]. Moreover, grad[u](xj, Xj_i) = a for all j < i < k. Thus, P is a free terminal path with 

VT[r 0 ] = a. 

Moreover, since v is a voltage assignment extending vq with ||grad[t>]|| = a, using Lemma 3.5, we know that 

every terminal path P' in (G,v 0 ) must satisfy X7P'{v 0 ) < a. Thus, P is a steepest fixable path in (G,v 0 ). Thus, 
letting v\ = fix[tto,P], using Corollary 3.4, we obtain that lex^fui] = lexc^o]- Moreover, since a = VP[uo] = 
grad [v\(xi, Xi_ 1 ) for all i £ (j, fc], we get v\ (xj) = v{xi) for all i £ (j, k). Thus, v extends v\. 

We can iterate this argument for r iterations until T{v r ) = Vq, giving v = v r and v r = lexc^] = lexc[i;o]. 

(Since we are fixing at least one terminal at each iteration, this procedure terminates). Thus, we get v = lexcfi'oj. □ 

B Description of the Algorithms 


Algorithm 2: MODDlJKSTRA(G,no, a)\ Given a well-posed instance (G, i>o), a gradient value a > 0, outputs a complete 
voltage assignment v for G, and an array parent : V —> V U {null}. 

1. for x £ Vg, 

2. Add x to a fibonacci heap, with key(x) = + 00 . 

3. finished(x) <— false 

4. for x £ T(v 0 ) 

5. Decrease key(x) to v<j(x). 

6. parent(x) <— null. 

7. while heap is not empty 

8 . x <— pop element with minimum key from heap 

9. v(x ) <— key(x). finished(x) <— true . 

10. for y : (x, y) £ Eg 

11. if finished (y) = false 

12. if key (y) > v(x) + a ■ £(x,y) 

13. Decrease key(y) to v(x) + a ■ £(x, y ). 

14. parent}?/) «— x. 

15. return (v, parent) 

Theorem B.l For a well-posed instance (G, Vo) and a gradient value a > 0, let (v, parent) t— ModDijkstra(G, vq, a). 
Then, v is a complete voltage assignment such that, Vx £ Vq, v(x) = min tg 7 ’( Uo ){xo(t) + adist(x, t)}. Moreover, the 
pointer array parent satisfies Vx ^ T(v 0 ), parent(x) null and v(x) = x(parent(x)) + a ■ £(x, parent(x)). 
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Algorithm 3: Algorithm CompVLow(G, vo, a): Given a well-posed instance (G,vo), a gradient value a > 0, outputs 
vLow, a complete voltage assignment for G, and an array LParent : V —> V U {null}. 

1. (vLow, LParent) g- ModDijkstra(G, vq, a) 

2. return (vLow, LParent) 


Algorithm 4: Algorithm CompVHigh(G, wo, a): Given a well-posed instance (G,v o), a gradient value a > 0, outputs 
vHigh, a complete voltage assignment for G, and an array HParent : V —t V U {null}. 

1. for x G Vg 

2. if x G T(vq ) then v\ (x) < - vq (x) else v\(x) <— v\ (x). 

3. (temp, HParent) g- ModDijkstra(G, vi, a) 

4. for at G Vc, ■ vHigh(at) t-temp(x) 

5. return (vHigh, HParent) 


Corollary B.2 For a well-posed instance (G, Vo) and a gradient value a > 0, let (vLow[a], LParent) t— CompVLow(G,xo, a) 
and (vHigh[a], HParent) t— CompVHigh(G, vo, a). Then, vLow[a], vHigh[ct] are complete voltage assignments for 
G such that, Vx G Vg, 

vLowfal(x) = min {x 0 (t) + a ■ dist(x, t)} vHigh[al(x) = max {u 0 (t) — a ■ dist(t, x)}. 

t£T(v 0 ) t£T(v 0 ) 

Moreover, the pointer arrays LParent, HParent satisfy Vx ^ T(v o), LParent(x), HParent(x) ^ null and 

vLow[a](x) = vLow[a](LParent(x)) + a ■ t{x, LParent(x)), 
vHigh[a](x) = vHigh[ct](HParent(x)) — a ■ £(x, HParent(x)). 


Algorithm 5: Algorithm CompInfMin(G, vo): Given a well-posed instance (G, vo), outputs a complete voltage assignment 
v for G, extending vo that minimizes 11 grad [v] 11 . 

L af- max{|grad[v 0 ](e)| | e G Eq D (T(n 0 ) x T(x 0 ))}. 

2. Eg f- Eg\ (T(vo) x T(v o)) 

3. P 3-SteepestPath(G, no). 

4. a <— max{a, VP(vo)} 

5. (vLow, LParent) <— COMPVLow(G, xo, a) 

6. (vHigh, HParent) <— CompVHigh(G, vo,a) 

7. for x G Vg 

8 . if i G r(« 0 ) 

9. then v(x) vo(x ) 

10. else v(x) <— \ ■ (vLow(x) + vHigh(x)). 

11. return v 


Algorithm 6: Algorithm CompHighPressGraph(G,uo, a): Given a well-posed instance (G, vo), a gradient value a > 0, 
outputs a minimal induced subgraph G' of G where every vertex has pressure[uo](-) > a. 

1. (vLow, LParent) <— COMPVLow(G, vq, a) 

2. (vHigh, HParent) t— CompVHigh(G, vq, a) 

3 . Vc g- {x G Vg | vHigh(x) > vLow(x) } 

4. Eq' g- {(x, y) G E g | x, y G V G '}- 
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5. G' <- (V',E',£) 

6 . return G' 


Proof of Lemma 4.3: 


vHigh[a](x) > vLow[a](a;) 


is equivalent to 

max {vo(t) — a ■ dist(t, x)} > min {n 0 (i) + a ■ dist(x, t)}, 
t£T(v 0 ) t£T(v 0 ) 

which implies that there exists terminals s, t £ T(v o) such that 


vo(t) — a • dist(i, x) > vp(s) + a ■ dist(x, s) 


thus, 


pressure[no](:r) > 


vo(t) ~ vq (s) 
dist(£, x) + dist(x, s) 


> a. 


So the inequality on vHigh andvLow implies that pressure is strictly greater than a. On the other hand, if pressure[no](x) > 
a , there exists terminals s,t £ T(v o) such that 


vo(t) - v 0 (s) 
dist(f, x) + dist(x, s) 


pressure[uo](a;) > a. 


Hence, 


vp{t) — a • dist(t, x) > n 0 (s) + a ■ dist(x, s) 


which implies vHigh[ct] (x) > vLow[a](x). 


□ 


Algorithm 7: Algorithm SteepestPath(G,uo): Given a well-posed instance (G, no), with T(v o) ^ Vg , outputs a steepest 
free terminal path P in (G, no). 

1. Sample uniformly random e £ Eg- Let e = (x\,X 2 )- 

2. Sample uniformly random . 7:3 £ Vg- 

3. for i = 1 to 3 

4. P £- VertexSteepestPath(G, no, xi) 

5. Let j £ argmax je{lj2i3} VP,(n 0 ) 

6. G' £- CompHighPressGraph(G,u 0 , VPj(v 0 )) 

1. if E g - =0, 

8. then return P 3 

9. else return SteepestPath(G', u 0 |y G ,) 


Algorithm 8: Algorithm CompLexMin(G, no): Given a well-posed instance (G, no), with T(v 0 ) ^ Vg, outputs lexc [no]. 

1. while T{v 0 ) ± V G 

2. E G E G \(T(v 0 ) xT(v 0 )) 

3. P <- SteepestPath(G,uo) 

4. vo <— fix[n 0 , P] 

5. return no 


Algorithm 9: Algorithm VERTEXSTEEPESTPATH(G,no, *): Given a well-posed instance (G,n 0 ), and a vertex x € Vg, 
outputs a steepest terminal path in (G, no) through x. 

1. Using Dijkstra’s algorithm, compute dist(a7, t ) for all t £ T(v 0 ) 
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2. if x G T[v o) 

1 „. , -- |vo(*)-vo(v)l 

-5- 2 /^ argmax ygT( „ o) — 3Iii ^— 

4. if vo(x) > vo{y) 

5. then return a shortest path from x to y 

6. else return a shortest path from y to x 

7. else 

8. for t ^ T(vq), d(t) A- dist(x, t) 

9. (ti,t 2 ) <- STARSTEEPESTPATH(T(v 0 ),v 0 | T („ 0 ),d) 

10. Let Pi be a shortest path from t\ to x. Let P 2 be a shortest path from x to t 2 . 

11. P t— (P 1; P 2 ). return P. 


Algorithm 10: StarSteepestPath(T, v, d): Returns the steepest path in a star graph, with a single non-terminal connected 
to terminals in T, with lengths given by d, and voltages given by v. 


1 . 

2 . 

3. 

4. 

5. 

6 . 

7. 

8 . 
9. 

10 . 

11 . 


Sample t \ uniformly and randomly from T 
Compute t 2 e argmax teT 


Compute V| 0W <— min t6 T(u(f) + a ■ d(<)) 

Tiow | v(t) > vi ow + ct • d(f)} 

Compute rthigh ma x t ^T{v(t) — a ■ d(t)) 

Thigh G- {t GT | v(i) < tthigh - a • d(t)} 

T' T| ow U Thigh- 
if T' = 0 

then if v(fi) > v(t 2 ) then return (ti, t 2 ) else return (t 2 , ti) 
else return StarSteepestPath(T', v\t>, dr') 


B.l Faster Lex-minimization 


Algorithm 11: Algorithm CompFastLexMin(G, vo): Given a well-posed instance (G, vo), with T(v o) ^ Vg, outputs 
lex G [u 0 ]. 

1. while T(vq) ± V G 

2. v 0 <— FixPathsAbovePress(G,vo,0) 

3. return vo 


Algorithm 12: Algorithm FixPathsAbovePress(G, vo, a): Given a well-posed instance (G, Vo), with T(v o) ^ Vg, and 
a gradient value a, iteratively fixes all paths with gradient > a. 

1. while T(v 0 ) ^ V G 

2. E G ^E G \ (T(vq) x T(v 0 )) 

3. Sample uniformly random e € E G . Let e = (x \. x 2 ). 

4 . Sample uniformly random X 3 G Vg- 

5. for i = 1 to 3 

6. Pi «- VertexSteepestPath(G, v 0 , x t ) 

7. Let j G argmax je{12i3} VPj(v 0 ) 

8. G'«- CompHighPressGraph(G, v 0 , VPj(v 0 )) 

9. if E G i = 0, 

10. then vo «— fix[vo, P] 

11. else Let G', j = 1,..., r be the connected components of G'. 

12 . for i = 1,..., r 
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13. Vi FixPathsAbovePress(G-,i;o|v g , , VPj(v 0 )) 

14. for x € Vg: , set vq { x ) <— Vi(x) 

15. if a > 0 then G -(-CompHighPressGraph(G, vo, a) 

16 . return vq 


C Experiments on WebSpam: Testing More Algorithms 

For completeness, in this appendix we show how a number of algorithms perform on the web spam experiment of 
Section 6. We consider the following algorithms: 

• Rand Walk along in-links. For a detailed description see Zhou et al. (2007). This algorithm essentially per¬ 
forms a Personalized PageRank random walk from each vertex x and computes a spam-value for the vertex x by 
taking a weighted average of the labels of the vertices where the random walk from x terminates. Also shown in 
Section 6. 

• DirectedLex, with edges in the opposite directions of links. This has the effect that a link to a spam host is 
evidence of spam, and a link from a normal host is evidence of normality. Also shown in Section 6. 

• RandWalk along out-links. 

• DirectedLex, with edges in the directions of links. This has the effect that a link from to a spam host is 
evidence of spam, and a link to a normal host is evidence of normality. 

• UndirectedLex: Lex-minimization with links treated as undirected edges. 

• Laplacian: ^-regression with links treated as undirected edges. 

• Directed 1 -Nearest Neighbor: Uses shortest distance along paths following out-links. Spam-ratio is 
defined distance from normal hosts, divided by distance to spam hosts. Sites are flagged as spam when spam- 
ratio exceeds some threshold. We also tried following paths along in-links instead, but that gave much worse 
results. 

We use the experimental setup described in Section 6. Results are shown in Figure 4. The alternative convention 
for DirectedLex orients edges in the directions of links. This takes a link from a spam host to be evidence of 
spam, and a link to a normal host to be evidence of normality. This approach performs significantly worse than our 
preferred convention, as one would intuitively expect. UndirectedLex and Laplacian approaches also perform 
significantly worse. Directed 1 -Nearest Neighbor performs poorly, demonstrating that DirectedLex is very 
different from that approach. As observed by Zhou et al. (2007), sampling based on a random walk following out-links 
performs worse than following in-links. Up to 60 % recall, DirectedLex performs best, both in the regime of 5 % 
labels for training and in the regime of 20 % labels for training. 
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5 % labels for training 



20 % labels for training 



Figure 4: Recall and precision in the WebSpam classification experiment. Each data point shown was computed as an average 
over 100 runs. The largest standard deviation of the mean precision across the plotted recall values was less than 1.5 %. The 
algorithm of Zhou et al. (2007) appears as RandWalk (along in-links). We also show RandWalk along out-links. Our directed 
lex-minimization algorithm appears as DirectedLex. We also show DirectedLex with link directions reversed, along with 
UndirectedLex and LAPLACIAN. 


D Z 0 -Vertex Regularization Proofs 

In this appendix, we prove Theorem 7.1 and Theorem 7.2. For the purposes of proving the second theorem, we intro¬ 
duce an alternative version of problem (3). The optimization problem here requires us to minimize ip-regularization 
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budget required to obtain an inf-minimizer with gradient below a given threshold: 


min ||n(T) — i> 0 (T)|| 

V6R n " " U 

subject to ||grad G [t;]|| oo < a. 


( 6 ) 


We will also need the following graph construction. 

Definition D.l The a-pressure terminal graph of a partially-labeled graph (G,v o) is a directed unweighted graph 
G a = {T(y o), E) such that (s, t) G E if and only if there is a terminal path P from s to t in G with 

VP(»o) > a. 

Note that the a-pressure terminal graph has O(n) vertices but may be dense, even when G is not. 


Algorithm 13: Algorithm TERM-PRESSURE: Given a well-posed instance (G,v o) and a > 0, outputs a pressure terminal 
graph G a . 

Initialize G a with vertex set V a = T(v o) and edge set E = 0. 
for each terminal s € T{v o) 

1. Compute the distances to every other terminal t by running Dijktra’s algorithm, allowing shortest paths 
that run through other terminals. 

2. Use the resulting distances to check for every other terminal t if there is a terminal path P from s to t with 
X7P(v o) > a. If there is, add edge (s, t) to E. 


Lemma D.2 The a-pressure terminal graph of a voltage problem ( G, Vq ) can be computed in 0((m + n log n)n) time 
using algorithm TERM-PRESSURE (Algorithm 13). 

Proof: The correctness of the algorithm follows from the fact that Dijkstra’s algorithm will identify all shortest 
distances between the terminals, and the pressure check will ensure that terminal pairs (s, t) are added to E if and 
only if they are the endpoints of a terminal path P with VP(ro) > a. The running time is dominated by performing 
Dijkstra’s algorithm once for each terminal. A single run of Dijkstra’s algorithm takes 0(m + n log n) time, and this 
is performed at most n times, for a total running time of 0((m + n log n)n). □ 

We make three observations that will turn out to be crucial for proving Theorems 7.1 and 7.2. 

Observation D.3 G a is a subgraph ofGp for a > /3. 

Proof: Suppose edge (s, t) appears in G a , then for some path P 


VP(v q) > a>j3, 


so the edge also appears in Gp. 


□ 


Observation D.4 G a is transitively closed. 

Proof: Suppose edges (s, t) and (t, r) appear in G a . Let P( s ,t )> P(t,r)-> -P( s , r ) be the respective shortest paths in G 
between these terminal pairs. Then 



t(P(s,r)) ~ £(P( a ,t)) + i(P(t,T)) £(P(s,t)) + £(P(t, r)) 



(7) 


So edge (s, r) also appears in G a . This is sufficient for G a to be transitively closed. 


□ 
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Observation D.5 G a is a directed acyclic graph. 


Proof: Suppose for a contradiction that a directed cycle appears in G a . Let s and t be two vertices in this cycle. Let 
P( s ,t) and P(t,s) be the respective shortest paths in G between these terminal pairs. Because G a is transitively closed, 
both edges (s, t) and (t, s) must appear in G a . But (s, t) £ E implies 


vq(s) - v 0 (t) > a£(P (Sjt) ) > 0, 

and similarly (t, s) £ E implies 

fo (t) - U 0 (s) > af(P( tjS )) > 0. 

This is a contradiction. □ 


The usefulness of the n -pressure terminal graph is captured in the following lemma. We define a vertex cover of a 
directed graph to be a vertex set that constitutes a vertex cover in the same graph with all edges taken to be undirected. 

Lemma D.6 Given a partially-labeled graph (G, no) and a set U C V, there exists a voltage assignment v £ 1R" that 
satisfies 

{t £ T(v o) : v{t) ± u 0 (f)} C U and ||grad^[w]||^ < a, 
if and only ifU is a vertex cover in the a-pressure terminal graph G a of (G, vq). 

Proof: We first show the “only if” direction. Suppose for a contradiction that there exists a voltage assignment v for 
which ||grad G [u] || < a, but U is not a vertex cover in G a . Let (s, t) be an edge G a which is not covered by U. The 

presence of this edge in G a implies that there exists a terminal path P from s to t in G for which 

VP(uo) > a. 

But, by Lemma 3.5 this means there is no assignment v for G which agrees with vo on s and t and has ||grad G [u] || < 

a. This contradicts our assumption. 

Now we show the “if” direction. Consider an arbitrary vertex cover U of G a . Suppose for a contradiction that 

grade[u]||oo < « and {* 6 T(v 0 ) : v(t) ± v 0 (t)} C U. 

if f G T(vo) \ U 

o.w. 

The preceding statement is equivalent to saying that there is no v that extends vjj and has 11 grad^ [v] 11 < a. By 
Lemma 3.5, this means there is terminal path between s,t £ T{vjj ) with gradient strictly larger than a. But this 
means an edge (s, t) is present in G a and is not covered. This contradicts our assumption that U is a vertex cover. □ 

We are now ready to prove Theorem 7.2. 

Proof of Theorem 7.2: We describe and prove the algorithm OUTLIER. The algorithm will reduce problem (3) 
to problem (6): Suppose v* is an optimal assignment for problem (3). It achieves a maximum gradient a* = 
||grad G [t;*]|| . Using Dijkstra’s algorithm we compute the pairwise shortest distances between all terminals in G. 

From these distances and the terminal voltages, we compute the gradient on the shortest path between each terminal 
pair. By Lemma 3.5, a* must equal one of these gradients. So we can solve problem (3) by iterating over the set of 
gradients between terminals and solving problem (6) for each of these 0(n 2 ) gradients. Among the assignments with 
\\v(T) — uo(P)|| Q < k, we then pick the solution that minimizes ||grad G [u] || . 

In fact, we can do better. By Observation D.3, G a is a subgraph of Gp for a > 3. This means a vertex cover 
of G a is also a vertex cover of Gp, and hence the minimum vertex cover for Gp is at least as large as the minimum 
vertex cover for G a ■ This means we can do a binary search on the set of 0(n 2 ) terminal gradients to find the minimum 
gradient for which there exists an assignment with ||u(T) — Uo(P)|| 0 < k. This way, we only make O(logn) calls to 
problem (6), in order to solve problem (3). 

We use the following algorithm to solve problem (6). 


there does not exist a voltage assignment v for G with 
Define a partial voltage assignment vjj given by 


vu(t) = 


vo (t) 
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1. Compute the a-pres sure terminal graph G a of G using the algorithm Term-Pressure. 

2. Compute a minimum vertex cover U of G a using the algorithm Konig-Cover from Theorem 7.3. 

3. Define a partial voltage assignment vu given by 


vu(t) 


Vo (t) if t G T(v 0 ) \ u, 


* otherwise. 


4. Using Algorithm 5, compute voltages v that extend vjj and output v. 


From Lemma D.2, it follows that step 1 computes the a-pressure terminal graph in polynomial time. From The¬ 
orem 7.3 it follows that step 2 computes the a minimum vertex cover of the a-pressure terminal graph in polynomial 
time, because our observations D.4 and D.5 establish that the graph is a TC-DAG. From Lemma D.6 and Theorem 4.6, 
it follows that the output voltages solve program (6). 


□ 


To prove Theorem 7.1, we use the standard greedy approximation algorithm for MIN-VC (Vazirani (2001)). 

Theorem D.7 2-Approximation Algorithm for Vertex Cover. The following algorithm gives a 2-approximation to 
the Minimum Vertex Cover problem on a graph G = ( V., E). 


0. Initialize U = 0. 

1. Pick an edge ( u , v) G E that is not covered by U. 

2. Add u and v to the set U. 

3. Repeat from step 1 if there are still edges not covered by U. 

4. Output U. 


We are now in a position to prove Theorem 7.1 

Proof of Theorem 7.1: Given an arbitrary k and a partially-labeled graph (G,v o), let a* be the optimum value 
of program (3). Observe that by Lemma D.6, this implies that G a * has a vertex cover of size k. Given the partial 
assignment vq, for every vertex set U, we define 



We claim the following algorithm Approx-Outlier outputs a voltage assignment v with ||grad G [t>]|| < a* 

and H'u(T) — uo(T )|| 0 < 2k. 


Algorithm APPROX-OUTLIER: 

0. Initialize [7 = 0. 

1. Using the algorithm SteepestPath (Algorithm 7), find a steepest terminal path in G w.r.t. vrj. Denote 
this path P and let s and t be its terminal endpoints. If there is no terminal path with positive gradient, skip 
to step 4. 

2. Add s and t to the set U. 

3. If |C7| < 2k — 2 then repeat from step 1. 

4. Using the algorithm CompInfMin (Algorithm 5), compute voltages v that extend vjj and output v. 


From the stopping conditions, it is clear that \U\ < 2k. If in step 1 we ever find that no terminal paths have positive 
gradient then our v that extends vjj will have ||grad G [i>] || = 0 < a*, by Lemma 3.5. Similarly if we find a steepest 
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path with gradient less than a* w.r.t. vjj, then for this U there exists v that extends vjj and has ||grad G [u] || < a*. 

This will continue to hold when if we add vertices to U. Therefore, for the final U, there will exist an v that extends 
vjj and has ||grad f j[u]|| < a*. 

If we never find a steepest terminal path P with VP(ro) < a*, then each steepest path we find corresponds to an 
edge in G a * that is not yet covered by U and our algorithm in fact implements the greedy approximation algorithm 
for vertex cover described in Theorem D.7. This implies that the final U is a vertex cover of G a * of size at most 2k. 
By Lemma D. 6, this implies that there exists a voltage assignment u extending vjj that has ||grad G [u]|| < a*. This 
implies by Theorem 4.6 that the v we output has ||grad G [t/| || < a*. 

In all cases, the v we output extends vu, so ||ii(T) — uo(T)|| 0 < \U\ < 2k. □ 


E Proof of Hardness of Iq regularization for l 2 

We will prove Theorem 7.4, by a reduction from minimum bisection. To this end, let G = (V, E ) be any graph. We 
will reduce the minimum bisection problem on G to our regularization problem. Let n = \V\. The graph on which we 
will perform regularization will have vertex set 

V UV, 

where V is a set of n vertices that are in 1-to-l correspondence with V. We assume that every edge in G has weight 1. 
We now connect every vertex in V to the corresponding vertex in V by an edge of weight B, for some large B to be 
determined later. We also connect all of the vertices in V to each other by edges of weight B 3 . So, we have a complete 
graph of weight B 3 edges on V, a matching of weight B edges connecting V to V, and the original graph G on V. 
The input potential function will be 

. , I 0 for a € V, and 
via) = < 

[1 fora G V. 

Now set k = n/2. We claim that we will be able to determine the value of the minimum bisection from the solution 
to the regularization problem. 

If S is the set of vertices on which v and w differ, then we know that the w is harmonic on S: for every a £ S, 
w(a) is the weighted average of the values at its neighbors. In the following, we exploit the fact that |S| < n/2. 

Claim E.l For every aeSny, w(a) < 2/nB 2 . 

Proof: Let a be the vertex in S fl V that maximizes w(a). So, a is connected to at least n/2 neighbors in V with 
w- value equal to 0 by edges of weight B 3 . On the other hand, a has only one neighbor that is not in V, that vertex has 
w-value at most 1, and it is connected to that vertex by an edge of weight B. Call that vertex c. We have 

((n — 1)B 3 + B)w(a) = Bw(c) + ^ B 3 w(b) 

b£V,b^a 

= Bw(c) + ^ B 3 w(b ) + ^ B 3 w(b) 

bevns^a bev-s 

< B + ^ B 3 w(a) 

bGVnS,b^a 

<B+{n/2-l)B 3 w{a). 

Subtracting ( n/2 — 1 )B 3 w(a) from both sides gives 

((n/2)B 3 + B)w(a) < B, 


which implies the claim. □ 

Claim E.2 For a € S nV, w(a) < n/B. 
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Proof: Vertex a has exactly one neighbor in V. Let’s call that neighbor c. We know that w(c) < 2/B 2 n. On the 
other hand, vertex a has fewer than n—1 neighbors in V, and each of these have //'-value at most 1. Let d a denote the 
degree of a in G. Then, 


(B + da)w(a) < d a + B 


2 

Wn' 


So, 


w(a) < 


< 


d a + 2/Bn 
d a + B 
n + {2/Bn) 
B + n 


< n/B. 


□ 


We now estimate the value of the regularized objective function. To this end, we assume that 

|Sj = k = n/2. 


Let 


T = S O V, 


and 

t=\T\. 

We will prove that S C V and so S = T and t = n/2. 

Let S denote the number of edges on the boundary of T in V. Once we know that t = n/2, 5 is the size of a 
bisection. 


Claim E.3 The contribution of the edges between V and V to the objective function is at least 

(n — t)B - A/B 


and at most 

(n — t)B + tn 2 / B. 

Proof: For the lower bound, we just count the edges between vertices in V \ T and V. There are n - / of these 
edges, and each of them has weight B. The endpoint in V \ T has w;-value 1, and the endpoint in V has w;-value at 
most 2/nB 2 . So, the contribution of these edges is at least 

(n - t)B( 1 - 2/nB 2 ) 2 >(n- t)B{ 1 - A/nB 2 ) > (n - t)B - A/B. 


For the upper bound, we observe that the difference in (/’-values across each of these n — t edges is at most 1, so their 
total contribution is at most 

(n — t)B. 

Since for every vertex a £ T, w{a) < n/B, and also every vertex b £ V, w(b) < 2/nB 2 , the contribution due to 
edges between T and V is at most 

t(n/B) 2 B = tn 2 /B. 

□ 


We will see that this is the dominant term in the objective function. The next-most important term comes from the 
edges in G. 
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Claim E.4 The contribution of the edges in G to the objective function is at least 


<5(1 - 2n/B) 

and at most 

S + {t 2 /2){n/B) 2 

Proof: Let (a, b) £ E. If neither a nor b is in T, then w(u) = w(b) = 1, and so this edge has no contribution. If 
a £ T but b T, then the difference in w;-values on them is between (1 — n/B) and 1. So, the contribution of such 
edges to the objective function is between 

<5(1 — 2 n/B) and S. 

Finally, if a and b are in T, then the difference in w- values on them is at most n/B, and so the contribution of all such 
edges to the objective function is at most 

{t 2 /2){n/B) 2 . 

□ 


Claim E.5 The edges between pairs of vertices in V contribute at most 2 /B to the objective function. 

Proof: As 0 < w(a) < 2 /B 2 n for every a £ V, every edge between two vertices in V can contribute at most 

B 3 {2/B 2 n) 2 = A/Bn 2 . 

As there are fewer than n 2 /2 such edges, their total contribution to the objective function is at most 

(n 2 /2){4/Bn 2 ) = 2/B. 

□ 


Lemma E.6 If n > 4 and B = 2 n 3 , the value of the objective function is at least 

(n-t)B + 5- 1/2 

and at most 

(n-t)B+ 8 + 1/3. 

Proof: Summing the contributions in the preceding three claims, we see that the value of the objective function is at 
least 



(n-t)B-4/B + 5(l- 

2n/B) > (n — 

t)B + 5 

-4/B-2nS/B 



>{n — 

t)B + 8 

— n 3 / B 



>{n — 

t)B + S 

-1/2, 

as <5 < {n/2) 2 . 





Similarly, the objective function is at most 





t)B + tn 2 / B + 5 + {t 2 /2){n/B) 

2 + 2/B < ( n 

— + 

n 3 /2B + S + n 4 /8B 2 + 2/B 



< (n 

— t)B -b 

n 3 /2B + 6 + l/32n 2 + 1 /n 3 



< (n 

— 

<5 + 1/3. 


Claim E.7 Ifn > 2 and B = 2n 3 , then S C V. 

Proof: The objective function is minimized by making t as large as possible, so t = n/2 and S C V. □ 
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Theorem E.8 The value of the objective function reveals the value of the minimum bisection in G. 
Proof: The value of the objective function will be between 


(n/2)B + 5- 1/2 

and 


(n/2)B + 6+ 1/3. 

So, the objective function will be smallest when 5 is as small as possible. 
Theorem E.8 immediately implies Theorem 7.4. 


□ 


30 


